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We study the detection of the t' of a fourth family during the early running of LHC 
with 7 TeV collision energy and 1 fb^^ integrated luminosity. By use of a neural 
network we show that it is feasible to search for the t' even with a mass close to 
the unitarity upper bound, which is in the 500 to 600 GeV range. We also present 
results for the Tevatron with 10 fb~^. In both cases the search for a fourth family 
quark doublet can be significantly enhanced if one incorporates the contribution 
that the b' can make to a t'-like signal. Thus the bound on the mass of a degenerate 
quark doublet should be stronger than the bounds obtained by treating t' and h' in 
isolation. 
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The LHC has begun its historic mission as it collects at least 1 fb ^ of data at 7 TeV 



2[ the search for heavy quarks at 



collision energy. For the sequential fourth family model 
the LHC will be the essential test, although other low energy experiments like LHCb [3|] can 
help to bound the model parameters. The fourth family contains the doublet {t',b'), and 
when their masses are in the 500 to 600 GeV range they are close to an upper bound imposed 
by partial wave uuitadty Fourth family quark masses anywhere close to this rarrge would 
have significant implications for electroweak symmetry breaking and flavor physics [5[. 

We shall assume that the dominant decay modes for these two heavy quarks are t' — )■ Wb 
and b' — )■ tW — )■ WWb, which is consistent with current bounds jo, 7|. It is known that the 
multivariate analysis methods [8|, like the neural network method and the boost decision 
tree method, can be quite useful to separate signal to background. They have been used 



successful 
Tevatron 



1. 



in the top quark precision measurements JS] and for single top production at 
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ll| . In our analysis we use 



;he MLP (Multi-Layer Perceptron) neural network 
method available in the TMVA package 8|. 

A feature of t' decay is the production of highly boosted and isolated W^s. The hadronic 
decay of such lets produce ly-jets, and their use in a simple reconstruction of the t' was 



explored 



in [l2 



14i |. (The W^-jets can also be searched for on their own 



Since these 



results were encouraging we were prompted to consider how a more conventional full recon- 
struction method could be enhanced with a neural network. Recently a similar approach 
was used for the more difficult process pp — )■ b'b' iuSj 1^ at 10 TeV and 1 fb~^ with 
nib' = 600 GeV. The sensitivity was found to compare favorably with the same-sign lepton 
mode. 

The CDF collaboration with 4.6 fb^^ of data has produced the bound m^/ > 335GeV by 
using a two dimensional fit to the (Mrec, Ht) distribution [l^. An early investigation 17 1 
of a t' search at LHC would have very pessimistic implications for the success of a t' search 
in the unitarity region in the first early running data. A more recent study of a vector-like 
quark decay T — )■ Wb in also carried out at 14 TeV, has less pessimistic conclusions. 
In that study two tagged b jets were required and a likelihood discrimination analysis was 
adopted. 

In this work we revisit the sensitivity to t' during the early running of LHC by proposing 
a new reconstruction method in association with a neural network. We account for the 
effect that b'b' production can have on a t' search, which has not been done elsewhere. 
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Depending on how the signal is defined, the 6' can strengthen a signal above standard model 
backgrounds. In our analysis we assume degenerate t' and h' masses. The impact of the h' on 
the t' analysis will only grow if the former has a smaller mass, as is often assumed, and thus 
we are providing a conservative estimate of this impact. Degenerate quark masses have not 
yet been ruled out in a model independent analysis. For example it is often assumed that the 
fourth neutrino has a Dirac mass, but when it has a Majorana mass then the constraints on 
mass ratios change considerably p, [l^. Also, for higher quark masses a simple perturbative 
analysis need no longer apply. 

We use Madgraph/MadEvent [lol to generate signal events and Alpgen {20 1 to generate 
background events. The MLM parton-jet matching method is used with pTmin = 100 GeV for 
the tt+nj samples and PTmm = 150 GeV for the W+nj samples. These choices are reasonable 
given the high value of Ht ~ 2mg/. In principle there should be little dependence on pTmin 
and we shall test this further below. Pythia [2l| is used to simulate shower, fragmentation, 
hadronization and decay processes. PGS [22|] is used to simulate the detector effects and to 
find jets, leptons, and missing energy in each event. We modify the PGS code slightly to use 
the anti-/cT jet-finding algorithm 23[. For the jet resolution parameter we choose R = 0.4. 
Other possible backgrounds, and in particular the irreducible tbW background, have been 



found to be small 



13|. 



We adopt a few preselection rules: jets are required to have prU) > 20 GeV, there is 
only one energetic lepton with Pt{^) > 20 GeV {£ = e , /i) and the missing energy satisfies 
P > 20 GeV. In Table [T] we provide the selection efficiency of the preselection cuts. After 
the preselection cuts the backgrounds are about an order of magnitude larger than the 
signal. Note that in order to generate the relevant background events more efficiently we 
have imposed process dependent Ht cuts, chosen in such a way as not to significantly affect 
the final results. 

For event reconstruction we minimize the following x^, where only the leading four jets 
in each event are used. The one or two jets with 6-tags (or the leading two if there are more 
than two b tags) are used to exclude those jets in the reconstruction of the hadronic W. 

^ ^2 + 2 + ^2 + 2 

i=l "W ^Pt i=i "Pt °t' 

The t' is reconstructed quite well by taking aw = 15, crp^ = 20, and at' = 25GeV. The 
results are similar to the CDF reconstruction method based on scanning a reference mass 
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b'b' 


t'i' 


+ jets 


tt + jets 


Pt{£) > 20GeV 


37% 


29% 


50% 


26% 


^> 20GeV 


36% 


28% 


33% 


24% 


Events with 1 fb^-*^ 


112.8 


85.6 


881.8 


1285.3 



TABLE I. Selection efficiencies of preselection rules are demonstrated, where in the last row we 
normalize the number of survived events for 1 fb^^ while assuming nj > 3. We also present the 
percentage for each rules in the preselection. In this table, the K factors, 1 for W+ jets and 1.5 
for the rest, have been included. 



ffi'^f^f ig|^ as demonstrated in Fig. ([T^). The difference between the mass bump shapes of 
these two methods can be partially accounted by the fact that our method also takes into 
account the possibility that the hadronic W is one jet. It is clear from Fig. (lb) that a 
simple mass bump analysis is not adequate to separate signal from background. 

' tpmb I 
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FIG. 1. (a) The reconstructed t' mass with the given in Eq. (1) is compared with the CDF 
reconstruction method, (b) The stacked reconstructed t' mass is shown with the background 
included. 



We use b tagging efficiencies of 0.6, 0.1 and 0.01 for 6, c and light quarks respectively. 
Table [ITl shows that b tagging can effectively suppress the W + nj background. The resulting 
significance is similar when the number of b tags is either nj, = 1 and = 2, and therefore 
we shall simply impose rih > 0. 

For the MLP neural network we choose 40 observables as input. The relatively large 
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I w2mb I I tp2mb | | tpl mb | 




FIG. 2. Reconstructed masses after the MLP discriminant cut in Scenario 1 are stackedly shown. 
Here the labels "w2mb", "tp2mb" and "tplmb" correspond to the reconstructed hadronic W, the 
reconstructed hadronic t' and the reconstructed semi-leptonic t'. All are based on the defined 
inEq. 1^. 



I w2mb I I tp2mb | | tplmb | 




FIG. 3. Fig. ([2]) repeated for Scenario 3. 

number of observables will more accurately capture the phase space structure of both signal 
and background events, and our particular choice of observables has been optimized by trial 
and error. We consider three scenarios for how the b' events are used in the training of the 
neural network. The results are presented in Table UTTl where the degenerate t' and b' masses 
are taken to be either 500 or 600 GeV. 

1) In the first scenario the neural network is trained by treating the b' events as back- 
ground. This scenario helps to establish how well the signals from t' and b' can be distin- 
guished. In Fig. ([2]) we show mass reconstructions for the hadronic W, the hadronic t' and 
the semi-leptonic t' after applying the MLP discriminant cut. This figure demonstrates that 
both our reconstruction methods and the MLP method work quite well. The discriminant 
plot for this method is shown in Fig. (jlK)- 

2) In the second scenario we ignore b' events in training. With respect to the neural 
network discriminant the b' events are distributed quite uniformly, and the events above the 
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b'b' 


t'i' 


W + jets 


tt + jets 


S/B 


S/^/S + B 


nb = 


36.3 


20.1 


813.6 


433.0 


0.02 


0.55 


"6 = 1 


48.8 


37.5 


61.8 


571.0 


0.07 


1.4 


rib = 2 


27.7 


28.0 


6.4 


310.5 


0.1 


1.5 



TABLE II. Number of events with tagged b jet multiplicity samples (1 fb ^) are demonstrated. 





b'b' 


ft' 


W + jets 


tt + jets 


S/B 


S/y/S + B 


Scenario 1 500 GeV 


4.4 


34.9 


5.6 


15.4 


1.4 


4.5 


Scenario 2 500 GeV 


7.3 


38.1 


6.9 


20.8 


1.6 


5.3 


Scenario 3 500 GeV 


33.7 


41.8 


7.8 


55.6 


1.2 


6.4 


Scenario 1 600 GeV 


2.4 


11.2 


4.0 


8.2 


0.8 


2.2 


Scenario 2 600 GeV 


3.0 


11.5 


3.9 


9.3 


1.1 


2.8 


Scenario 3 600 GeV 


9.8 


13.0 


5.5 


25.9 


0.8 


3.1 


Scenario 1 500 GeV 


7.2 


36.2 


13.1 


31.7 


0.7 


3.8 



TABLE III. Summary of number of events (with 1 fb^^) from different scenarios for the treatment 
of the b' events are shown. The last row shows a sample result when using observables that do not 
rely on a full reconstruction. 

cut will contribute to the signal. The background rejection also improves a little and the 
result is a better S/B. 

3) In the third scenario we train using both t' and b' events as signal. Note that the 
reconstruction procedure remains the same and is still geared towards reconstructing the t'. 
This shows up as what appears to be a reduced S/B in the three mass reconstructions of Fig. 
([3]). In addition the location of the effective t' mass peak shifts down, due to decay products 
of the b' being missed. However the power of the neural network becomes apparent in this 
case, since it resorts to other kinematic observables to distinguish signal from background. 
Its success shows up in the nicely improved sensitivities in Table IIIII 

For comparison we also show the performance that can be obtained without performing 
any reconstruction. In this case we only make use of the kinematic observables adopted in the 
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The neural network discriminant distribution 



Tlie significance near tlie unitarity region | 
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(a) 



(b) 



FIG. 4. a) The stacked MLP discriminant distributions for signal and background for Scenario 1 



are displayed, b) The sensitivities, S/^J S + B, for the three scenarios at the early running of LHC 
with 7 TeV are shown, where the x-axis is the mass of t' in GeV and the y-axis is the significance. 



top quark precision measurements j24']. The result is shown in the last row of Table UTTI and 
the comparison to the first row shows that observables associated with the reconstruction 
are somewhat helpful to achieve a better sensitivity. 

From the results in Table III we display in Fig. (jib) our final estimates for the sensitivity 
of the LHC to the t' for masses close to the unitarity bound. 

We note that there is an uncertainty on the overall normalization of the backgrounds due 
to our reliance on a Monte Carlo estimate and in the end a more data-driven approach to 
background subtraction may be adopted by experimentalists. But to gain some confidence 
in the Monte Carlo estimates we can test their stability by changing the parameter pTmin 
used by Alpgen in the generation of the dominant tt+jets background. We generate samples 
with jet multiplicities 0, 1, 2 and > 3 for the two choices PTmin = 100 GeV (our choice above) 
and PTmin = 20 GeV (a more time consuming choice). The characteristics such as the Ht 



distribution of the combined samples for the two cases are very similar as expected |13 |. 
In Table IIVI we present the number of events passing the discriminant cut of the neural 
network and we see that while the individual samples vary dramatically, the difference in 
the total number of events is small. This indicates that not only is Alpgen performing well, 
but there is no nontrivial dependence on pTmin arising through interaction between the event 
generation and the neural network. 



Next we explore the future Tevatron bounds by assuming an integrated luminosity of 10 
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-pTmin 


tt + Oj 


tt + j 


tt + 2j 


tt + {> 3j) 


total 


Scenario 1 


100 GeV 


4.2 


8.4 


2.6 


0.2 


15.4 


Scenario 1 


20 GeV 


1.4 


4.3 


4.2 


4.6 


14.6 


Scenario 2 


100 GeV 


5.7 


10.9 


4.0 


0.2 


20.8 


Scenario 2 


20 GeV 


2.0 


4.8 


4.6 


7.0 


18.4 


Scenario 3 


100 GeV 


11.5 


28.9 


15.1 


0.2 


55.6 


Scenario 3 


20 GeV 


2.1 


8.2 


20.0 


26.0 


56.4 



TABLE IV. Number of events with two different PTmin values which passed the neural network 
discriminant cut are shown in three scenarios. 

fb~^. The main results are shown in Table IVl where the constraint on nt is released (due 
to the relatively poor performance of b tagging at the Tevatron) while rij > 4 is applied. 
Here we see an even bigger increase in the sensitivities in the progression through the three 
scenarios. Thus if the goal is to discover or rule out a fourth family one should allow both 
the t' and the b' to contribute to a signal. We see again that the neural network can actively 
pull out the combined signal from background even with a reconstruction method that is 
geared to the t'. 





b'b' 


ft' 


W +iets 


tt + jets 


S/B 


S/y/S + B 


Scenario 1 


15.19 


30.52 


8.8 


41.6 


0.46 


3.11 


Scenario 2 


22.10 


38.76 


13.5 


38.0 


1.18 


5.74 


Scenario 3 


56.0 


41.2 


12.3 


94.3 


0.91 


6.81 



TABLE V. Tevatron results with ^/s = 1.98 TeV and 10 fb"^ Here mf 



nib' 



400 GeV and 



nj > 4. 



In the current CDF t' search [16|] the possible existence of a b' is not considered. A b' 
can generate events that could be interpreted as t' events, especially given that a bin with 
5 or more jets is kept. We have noted above that the b' events can produce a distribution 
in the reconstructed mass M^ec that is somewhat broader and lower than for the t' events, 
assuming equal masses. The net effect could be to roughly double the cross section for the 
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production of events in the high M^ec and bins. This suggests that a reanalysis of the 
CDF data may be warranted, especially given the slight excess already seen in the high bins. 

We have considered a search for t' type heavy quarks during the early running of the 
LHC with collision energy 7 TeV and integrated luminosity 1 fb~^ and found sensitivity to 
quark masses close to the unitarity upper bound, which is in the 500 to 600 GeV range. 
Any enhancement of the collision energy or luminosity would give good prospects for the 
discovery of a fourth family even at the high end of this mass range. In order to obtain 
more reliable estimates of the required luminosities a more detailed experimental analysis 
and full detector simulation is inevitably needed, which is beyond the scope of this work. 

We have noted that the sensitivity depends on how the b' events are treated in the 
analysis. If the goal is to set limits on the existence of a sequential fourth family then it 
is advantageous to enhance the b' contribution to the signal. A multivariate tool such as a 
neural network is an efficient way to accomplish this since it can account for the different 
characteristics of t' and b' events simultaneously. The CDF collaboration has presented the 
limits nit' < 335 GeV [l6| and m^/ < 385 GeV |25| in separate analyses. We are suggesting 
that their sensitivity to the existence of a nearly degenerate quark doublet of a fourth family 
should be greater than these numbers indicate. 
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